explain why I chose smoking data and maybe where my data comes from?? This data package contains the data that powers the chart “Share of adults who smoke” on the Our World in Data website https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators
How does the prevalence of smoking in adults across the world vary in the dataset, and what trends or patterns emerge from this analysis?
The PSY6422_smoke repository is organized into key sections to help you navigate its contents. The /codebook folder provides detailed documentation on the dataset, including variable descriptions and structure, offering essential context for the analysis. The /data folder contains the raw datasets used in this project, forming the basis of all analyses. The /figures folder showcases visualizations and plots created during the analysis, highlighting the project’s key findings and insights. Lastly, the /scripts folder includes all the code used for data processing, analysis, and visualization. Together, these sections guide you through the project workflow, from raw data to final outputs.
The raw dataset for this visualization project comes from : Multiple sources compiled by World Bank (2024) – processed by Our World in Data. “Prevalence of current tobacco use (% of adults)” [dataset]. World Health Organization (via World Bank), “World Development Indicators” [original data]. Source: Multiple sources compiled by World Bank (2024) – processed by Our World In Data
The percentage of the population ages 15 years and over who currently use any tobacco product (smoked and/or smokeless tobacco) on a daily or non-daily basis. Tobacco products include cigarettes, pipes, cigars, cigarillos, waterpipes (hookah, shisha), bidis, kretek, heated tobacco products, and all forms of smokeless (oral and nasal) tobacco. Tobacco products exclude e-cigarettes (which do not contain tobacco), “e-cigars”, “e-hookahs”, JUUL and “e-pipes”. The rates are age-standardized to the WHO Standard Population.
These considerations are important when interpreting the project’s results Estimates for countries with irregular surveys or many data gaps have large uncertainty ranges, and such results should be interpreted with caution.
# List of packages to install and load
packages <- c("tidyverse", "ggplot2", "tidyr", "dplyr", "plotly", "rnaturalearth", "rnaturalearthdata", "sf")
# Function to install packages and load them
install_and_load <- function(packages) {
for (package in packages) {
if (!require(package, character.only = TRUE)) {
install.packages(package, dependencies = TRUE)
library(package, character.only = TRUE)
} else {
library(package, character.only = TRUE)
}
}
}
# Run the function
install_and_load(packages)
## Loading required package: tidyverse
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'tidyr' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: plotly
## Warning: package 'plotly' was built under R version 4.4.2
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
##
## Loading required package: rnaturalearth
## Warning: package 'rnaturalearth' was built under R version 4.4.2
## Loading required package: rnaturalearthdata
## Warning: package 'rnaturalearthdata' was built under R version 4.4.2
##
## Attaching package: 'rnaturalearthdata'
##
## The following object is masked from 'package:rnaturalearth':
##
## countries110
##
## Loading required package: sf
## Warning: package 'sf' was built under R version 4.4.2
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
# Load data
``` r
# Load raw data
rawdata <- read.csv("data/smoking.csv")
In order to create my visualization, I had to create new variables. I got variable world from the function sf, which contains the resources necessary for my analysis. I then merged my world data and my smoking data into ‘map_data’ via the ISO code.
after my initial sanity check, I started to clean my data. The data contained specific entities such as different regions of the globe and the different income levels. In order to visualize the data I had to take these out. Further more, I wanted to visualize the data by 5 years so i got rid of 2018 and 2019. I also renamed the variable prevalence for ease.and fixed my missing isos
# Clean data: Remove specific entities
countries_data <- rawdata[!rawdata$Entity %in% c("East Asia and Pacific (WB)", "Sub-Saharan Africa (WB)",
"Upper-middle-income countries", "Europe and Central Asia (WB)",
"World", "European Union (27)", "Low-income countries",
"Lower-middle-income countries", "Middle East and North Africa (WB)",
"Middle-income countries", "North America (WB)", "South Asia (WB)",
"Latin America and Caribbean (WB)", "High-income countries"), ]
# Further clean data: Exclude years 2018 and 2019
countries_data <- countries_data %>%
filter(!(Year %in% c(2018, 2019)))
# Rename the column
countries_data <- countries_data %>%
rename(Prevalence = Prevalence.of.current.tobacco.use....of.adults.)
#sanity check
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf") # 'sf' format for spatial data
# Check the structure of the geospatial data
head(world)
## Simple feature collection with 6 features and 168 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -73.36621 ymin: -22.40205 xmax: 109.4449 ymax: 41.9062
## Geodetic CRS: WGS 84
## featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level
## 1 Admin-0 country 1 3 Zimbabwe ZWE 0 2
## 2 Admin-0 country 1 3 Zambia ZMB 0 2
## 3 Admin-0 country 1 3 Yemen YEM 0 2
## 4 Admin-0 country 3 2 Vietnam VNM 0 2
## 5 Admin-0 country 5 3 Venezuela VEN 0 2
## 6 Admin-0 country 6 6 Vatican VAT 0 2
## type tlc admin adm0_a3 geou_dif geounit gu_a3 su_dif
## 1 Sovereign country 1 Zimbabwe ZWE 0 Zimbabwe ZWE 0
## 2 Sovereign country 1 Zambia ZMB 0 Zambia ZMB 0
## 3 Sovereign country 1 Yemen YEM 0 Yemen YEM 0
## 4 Sovereign country 1 Vietnam VNM 0 Vietnam VNM 0
## 5 Sovereign country 1 Venezuela VEN 0 Venezuela VEN 0
## 6 Sovereign country 1 Vatican VAT 0 Vatican VAT 0
## subunit su_a3 brk_diff name name_long brk_a3 brk_name brk_group
## 1 Zimbabwe ZWE 0 Zimbabwe Zimbabwe ZWE Zimbabwe <NA>
## 2 Zambia ZMB 0 Zambia Zambia ZMB Zambia <NA>
## 3 Yemen YEM 0 Yemen Yemen YEM Yemen <NA>
## 4 Vietnam VNM 0 Vietnam Vietnam VNM Vietnam <NA>
## 5 Venezuela VEN 0 Venezuela Venezuela VEN Venezuela <NA>
## 6 Vatican VAT 0 Vatican Vatican VAT Vatican <NA>
## abbrev postal formal_en
## 1 Zimb. ZW Republic of Zimbabwe
## 2 Zambia ZM Republic of Zambia
## 3 Yem. YE Republic of Yemen
## 4 Viet. VN Socialist Republic of Vietnam
## 5 Ven. VE Bolivarian Republic of Venezuela
## 6 Vat. V State of the Vatican City
## formal_fr name_ciawf note_adm0 note_brk
## 1 <NA> Zimbabwe <NA> <NA>
## 2 <NA> Zambia <NA> <NA>
## 3 <NA> Yemen <NA> <NA>
## 4 <NA> Vietnam <NA> <NA>
## 5 República Bolivariana de Venezuela Venezuela <NA> <NA>
## 6 <NA> Holy See (Vatican City) <NA> <NA>
## name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13 pop_est
## 1 Zimbabwe <NA> 1 5 3 9 14645468
## 2 Zambia <NA> 5 8 5 13 17861030
## 3 Yemen, Rep. <NA> 5 3 3 11 29161922
## 4 Vietnam <NA> 5 6 5 4 96462106
## 5 Venezuela, RB <NA> 1 3 1 4 28515829
## 6 Vatican (Holy See) Holy See 1 3 4 2 825
## pop_rank pop_year gdp_md gdp_year economy
## 1 14 2019 21440 2019 5. Emerging region: G20
## 2 14 2019 23309 2019 7. Least developed region
## 3 15 2019 22581 2019 7. Least developed region
## 4 16 2019 261921 2019 5. Emerging region: G20
## 5 15 2019 482359 2014 5. Emerging region: G20
## 6 2 2019 -99 2019 2. Developed region: nonG7
## income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh iso_n3
## 1 5. Low income ZI ZW ZW ZWE ZWE 716
## 2 4. Lower middle income ZA ZM ZM ZMB ZMB 894
## 3 4. Lower middle income YM YE YE YEM YEM 887
## 4 4. Lower middle income VM VN VN VNM VNM 704
## 5 3. Upper middle income VE VE VE VEN VEN 862
## 6 2. High income: nonOECD VT VA VA VAT VAT 336
## iso_n3_eh un_a3 wb_a2 wb_a3 woe_id woe_id_eh woe_note
## 1 716 716 ZW ZWE 23425004 23425004 Exact WOE match as country
## 2 894 894 ZM ZMB 23425003 23425003 Exact WOE match as country
## 3 887 887 RY YEM 23425002 23425002 Exact WOE match as country
## 4 704 704 VN VNM 23424984 23424984 Exact WOE match as country
## 5 862 862 VE VEN 23424982 23424982 Exact WOE match as country
## 6 336 336 -99 -99 23424986 23424986 Exact WOE match as country
## adm0_iso adm0_diff adm0_tlc adm0_a3_us adm0_a3_fr adm0_a3_ru adm0_a3_es
## 1 ZWE <NA> ZWE ZWE ZWE ZWE ZWE
## 2 ZMB <NA> ZMB ZMB ZMB ZMB ZMB
## 3 YEM <NA> YEM YEM YEM YEM YEM
## 4 VNM <NA> VNM VNM VNM VNM VNM
## 5 VEN <NA> VEN VEN VEN VEN VEN
## 6 VAT <NA> VAT VAT VAT VAT VAT
## adm0_a3_cn adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb
## 1 ZWE ZWE ZWE ZWE ZWE ZWE ZWE
## 2 ZMB ZMB ZMB ZMB ZMB ZMB ZMB
## 3 YEM YEM YEM YEM YEM YEM YEM
## 4 VNM VNM VNM VNM VNM VNM VNM
## 5 VEN VEN VEN VEN VEN VEN VEN
## 6 VAT VAT VAT VAT VAT VAT VAT
## adm0_a3_br adm0_a3_il adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt
## 1 ZWE ZWE ZWE ZWE ZWE ZWE ZWE
## 2 ZMB ZMB ZMB ZMB ZMB ZMB ZMB
## 3 YEM YEM YEM YEM YEM YEM YEM
## 4 VNM VNM VNM VNM VNM VNM VNM
## 5 VEN VEN VEN VEN VEN VEN VEN
## 6 VAT VAT VAT VAT VAT VAT VAT
## adm0_a3_ar adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
## 1 ZWE ZWE ZWE ZWE ZWE ZWE ZWE
## 2 ZMB ZMB ZMB ZMB ZMB ZMB ZMB
## 3 YEM YEM YEM YEM YEM YEM YEM
## 4 VNM VNM VNM VNM VNM VNM VNM
## 5 VEN VEN VEN VEN VEN VEN VEN
## 6 VAT VAT VAT VAT VAT VAT VAT
## adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua adm0_a3_un
## 1 ZWE ZWE ZWE ZWE ZWE ZWE -99
## 2 ZMB ZMB ZMB ZMB ZMB ZMB -99
## 3 YEM YEM YEM YEM YEM YEM -99
## 4 VNM VNM VNM VNM VNM VNM -99
## 5 VEN VEN VEN VEN VEN VEN -99
## 6 VAT VAT VAT VAT VAT VAT -99
## adm0_a3_wb continent region_un subregion
## 1 -99 Africa Africa Eastern Africa
## 2 -99 Africa Africa Eastern Africa
## 3 -99 Asia Asia Western Asia
## 4 -99 Asia Asia South-Eastern Asia
## 5 -99 South America Americas South America
## 6 -99 Europe Europe Southern Europe
## region_wb name_len long_len abbrev_len tiny homepart
## 1 Sub-Saharan Africa 8 8 5 -99 1
## 2 Sub-Saharan Africa 6 6 6 -99 1
## 3 Middle East & North Africa 5 5 4 -99 1
## 4 East Asia & Pacific 7 7 5 2 1
## 5 Latin America & Caribbean 9 9 4 -99 1
## 6 Europe & Central Asia 7 7 4 4 1
## min_zoom min_label max_label label_x label_y ne_id wikidataid
## 1 0 2.5 8.0 29.92544 -18.911640 1159321441 Q954
## 2 0 3.0 8.0 26.39530 -14.660804 1159321439 Q953
## 3 0 3.0 8.0 45.87438 15.328226 1159321425 Q805
## 4 0 2.0 7.0 105.38729 21.715416 1159321417 Q881
## 5 0 2.5 7.5 -64.59938 7.182476 1159321411 Q717
## 6 0 5.0 10.0 12.45342 41.903323 1159321407 Q237
## name_ar name_bn name_de name_en name_es
## 1 زيمبابوي জিম্বাবুয়ে Simbabwe Zimbabwe Zimbabue
## 2 زامبيا জাম্বিয়া Sambia Zambia Zambia
## 3 اليمن ইয়েমেন Jemen Yemen Yemen
## 4 فيتنام ভিয়েতনাম Vietnam Vietnam Vietnam
## 5 فنزويلا ভেনেজুয়েলা Venezuela Venezuela Venezuela
## 6 الفاتيكان ভ্যাটিকান সিটি Vatikanstadt Vatican City Ciudad del Vaticano
## name_fa name_fr name_el name_he name_hi name_hu
## 1 زیمبابوه Zimbabwe Ζιμπάμπουε זימבבואה ज़िम्बाब्वे Zimbabwe
## 2 زامبیا Zambie Ζάμπια זמביה ज़ाम्बिया Zambia
## 3 یمن Yémen Υεμένη תימן यमन Jemen
## 4 ویتنام Viêt Nam Βιετνάμ וייטנאם वियतनाम Vietnám
## 5 ونزوئلا Venezuela Βενεζουέλα ונצואלה वेनेज़ुएला Venezuela
## 6 واتیکان Cité du Vatican Βατικανό קריית הוותיקן वैटिकन नगर Vatikán
## name_id name_it name_ja name_ko name_nl name_pl
## 1 Zimbabwe Zimbabwe ジンバブエ 짐바브웨 Zimbabwe Zimbabwe
## 2 Zambia Zambia ザンビア 잠비아 Zambia Zambia
## 3 Yaman Yemen イエメン 예멘 Jemen Jemen
## 4 Vietnam Vietnam ベトナム 베트남 Vietnam Wietnam
## 5 Venezuela Venezuela ベネズエラ 베네수엘라 Venezuela Wenezuela
## 6 Vatikan Città del Vaticano バチカン 바티칸 시국 Vaticaanstad Watykan
## name_pt name_ru name_sv name_tr name_uk name_ur
## 1 Zimbábue Зимбабве Zimbabwe Zimbabve Зімбабве زمبابوے
## 2 Zâmbia Замбия Zambia Zambiya Замбія زیمبیا
## 3 Iémen Йемен Jemen Yemen Ємен یمن
## 4 Vietname Вьетнам Vietnam Vietnam В'єтнам ویتنام
## 5 Venezuela Венесуэла Venezuela Venezuela Венесуела وینیزویلا
## 6 Vaticano Ватикан Vatikanstaten Vatikan Ватикан ویٹیکن سٹی
## name_vi name_zh name_zht fclass_iso tlc_diff fclass_tlc
## 1 Zimbabwe 津巴布韦 辛巴威 Admin-0 country <NA> Admin-0 country
## 2 Zambia 赞比亚 尚比亞 Admin-0 country <NA> Admin-0 country
## 3 Yemen 也门 葉門 Admin-0 country <NA> Admin-0 country
## 4 Việt Nam 越南 越南 Admin-0 country <NA> Admin-0 country
## 5 Venezuela 委内瑞拉 委內瑞拉 Admin-0 country <NA> Admin-0 country
## 6 Thành Vatican 梵蒂冈 梵蒂岡 Admin-0 country <NA> Admin-0 country
## fclass_us fclass_fr fclass_ru fclass_es fclass_cn fclass_tw fclass_in
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il fclass_ps
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar fclass_jp fclass_ko
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## fclass_vn fclass_tr fclass_id fclass_pl fclass_gr fclass_it fclass_nl
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## fclass_se fclass_bd fclass_ua geometry
## 1 <NA> <NA> <NA> MULTIPOLYGON (((31.28789 -2...
## 2 <NA> <NA> <NA> MULTIPOLYGON (((30.39609 -1...
## 3 <NA> <NA> <NA> MULTIPOLYGON (((53.08564 16...
## 4 <NA> <NA> <NA> MULTIPOLYGON (((104.064 10....
## 5 <NA> <NA> <NA> MULTIPOLYGON (((-60.82119 9...
## 6 <NA> <NA> <NA> MULTIPOLYGON (((12.43916 41...
# Check the column names in both datasets
colnames(world)
## [1] "featurecla" "scalerank" "labelrank" "sovereignt" "sov_a3"
## [6] "adm0_dif" "level" "type" "tlc" "admin"
## [11] "adm0_a3" "geou_dif" "geounit" "gu_a3" "su_dif"
## [16] "subunit" "su_a3" "brk_diff" "name" "name_long"
## [21] "brk_a3" "brk_name" "brk_group" "abbrev" "postal"
## [26] "formal_en" "formal_fr" "name_ciawf" "note_adm0" "note_brk"
## [31] "name_sort" "name_alt" "mapcolor7" "mapcolor8" "mapcolor9"
## [36] "mapcolor13" "pop_est" "pop_rank" "pop_year" "gdp_md"
## [41] "gdp_year" "economy" "income_grp" "fips_10" "iso_a2"
## [46] "iso_a2_eh" "iso_a3" "iso_a3_eh" "iso_n3" "iso_n3_eh"
## [51] "un_a3" "wb_a2" "wb_a3" "woe_id" "woe_id_eh"
## [56] "woe_note" "adm0_iso" "adm0_diff" "adm0_tlc" "adm0_a3_us"
## [61] "adm0_a3_fr" "adm0_a3_ru" "adm0_a3_es" "adm0_a3_cn" "adm0_a3_tw"
## [66] "adm0_a3_in" "adm0_a3_np" "adm0_a3_pk" "adm0_a3_de" "adm0_a3_gb"
## [71] "adm0_a3_br" "adm0_a3_il" "adm0_a3_ps" "adm0_a3_sa" "adm0_a3_eg"
## [76] "adm0_a3_ma" "adm0_a3_pt" "adm0_a3_ar" "adm0_a3_jp" "adm0_a3_ko"
## [81] "adm0_a3_vn" "adm0_a3_tr" "adm0_a3_id" "adm0_a3_pl" "adm0_a3_gr"
## [86] "adm0_a3_it" "adm0_a3_nl" "adm0_a3_se" "adm0_a3_bd" "adm0_a3_ua"
## [91] "adm0_a3_un" "adm0_a3_wb" "continent" "region_un" "subregion"
## [96] "region_wb" "name_len" "long_len" "abbrev_len" "tiny"
## [101] "homepart" "min_zoom" "min_label" "max_label" "label_x"
## [106] "label_y" "ne_id" "wikidataid" "name_ar" "name_bn"
## [111] "name_de" "name_en" "name_es" "name_fa" "name_fr"
## [116] "name_el" "name_he" "name_hi" "name_hu" "name_id"
## [121] "name_it" "name_ja" "name_ko" "name_nl" "name_pl"
## [126] "name_pt" "name_ru" "name_sv" "name_tr" "name_uk"
## [131] "name_ur" "name_vi" "name_zh" "name_zht" "fclass_iso"
## [136] "tlc_diff" "fclass_tlc" "fclass_us" "fclass_fr" "fclass_ru"
## [141] "fclass_es" "fclass_cn" "fclass_tw" "fclass_in" "fclass_np"
## [146] "fclass_pk" "fclass_de" "fclass_gb" "fclass_br" "fclass_il"
## [151] "fclass_ps" "fclass_sa" "fclass_eg" "fclass_ma" "fclass_pt"
## [156] "fclass_ar" "fclass_jp" "fclass_ko" "fclass_vn" "fclass_tr"
## [161] "fclass_id" "fclass_pl" "fclass_gr" "fclass_it" "fclass_nl"
## [166] "fclass_se" "fclass_bd" "fclass_ua" "geometry"
colnames(countries_data)
## [1] "Entity" "Code" "Year" "Prevalence"
# Unique country names in the world dataset
unique(world$iso_a3)
## [1] "ZWE" "ZMB" "YEM" "VNM" "VEN" "VAT" "VUT" "UZB" "URY" "FSM" "MHL" "MNP"
## [13] "VIR" "GUM" "ASM" "PRI" "USA" "SGS" "IOT" "SHN" "PCN" "AIA" "FLK" "CYM"
## [25] "BMU" "VGB" "TCA" "MSR" "JEY" "GGY" "IMN" "GBR" "ARE" "UKR" "UGA" "TKM"
## [37] "TUR" "TUN" "TTO" "TON" "TGO" "TLS" "THA" "TZA" "TJK" "TWN" "SYR" "CHE"
## [49] "SWE" "SWZ" "SUR" "SSD" "SDN" "LKA" "ESP" "KOR" "ZAF" "SOM" "-99" "SLB"
## [61] "SVK" "SVN" "SGP" "SLE" "SYC" "SRB" "SEN" "SAU" "STP" "SMR" "WSM" "VCT"
## [73] "LCA" "KNA" "RWA" "RUS" "ROU" "QAT" "PRT" "POL" "PHL" "PER" "PRY" "PNG"
## [85] "PAN" "PLW" "PAK" "OMN" "PRK" "NGA" "NER" "NIC" "NZL" "NIU" "COK" "NLD"
## [97] "ABW" "CUW" "NPL" "NRU" "NAM" "MOZ" "MAR" "ESH" "MNE" "MNG" "MDA" "MCO"
## [109] "MEX" "MUS" "MRT" "MLT" "MLI" "MDV" "MYS" "MWI" "MDG" "MKD" "LUX" "LTU"
## [121] "LIE" "LBY" "LBR" "LSO" "LBN" "LVA" "LAO" "KGZ" "KWT" "KIR" "KEN" "KAZ"
## [133] "JOR" "JPN" "JAM" "ITA" "ISR" "PSE" "IRL" "IRQ" "IRN" "IDN" "IND" "ISL"
## [145] "HUN" "HND" "HTI" "GUY" "GNB" "GIN" "GTM" "GRD" "GRC" "GHA" "DEU" "GEO"
## [157] "GMB" "GAB" "SPM" "WLF" "MAF" "BLM" "PYF" "NCL" "ATF" "ALA" "FIN" "FJI"
## [169] "ETH" "EST" "ERI" "GNQ" "SLV" "EGY" "ECU" "DOM" "DMA" "DJI" "GRL" "FRO"
## [181] "DNK" "CZE" "CYP" "CUB" "HRV" "CIV" "CRI" "COD" "COG" "COM" "COL" "CHN"
## [193] "MAC" "HKG" "CHL" "TCD" "CAF" "CPV" "CAN" "CMR" "KHM" "MMR" "BDI" "BFA"
## [205] "BGR" "BRN" "BRA" "BWA" "BIH" "BOL" "BTN" "BEN" "BLZ" "BEL" "BLR" "BRB"
## [217] "BGD" "BHR" "BHS" "AZE" "AUT" "AUS" "HMD" "NFK" "ARM" "ARG" "ATG" "AGO"
## [229] "AND" "DZA" "ALB" "AFG" "ATA" "SXM" "TUV"
# Unique country names in the countries_data dataset
unique(countries_data$Code)
## [1] "AFG" "ALB" "DZA" "AND" "ARG" "ARM" "AUS" "AUT" "AZE" "BHS" "BHR" "BGD"
## [13] "BRB" "BLR" "BEL" "BLZ" "BEN" "BOL" "BIH" "BWA" "BRA" "BRN" "BGR" "BFA"
## [25] "BDI" "KHM" "CMR" "CAN" "CPV" "TCD" "CHL" "CHN" "COL" "COM" "COG" "CRI"
## [37] "CIV" "HRV" "CUB" "CYP" "CZE" "COD" "DNK" "DOM" "TLS" "ECU" "EGY" "SLV"
## [49] "ERI" "EST" "SWZ" "ETH" "FJI" "FIN" "FRA" "GMB" "GEO" "DEU" "GHA" "GRC"
## [61] "GTM" "GNB" "GUY" "HTI" "HUN" "ISL" "IND" "IDN" "IRN" "IRQ" "IRL" "ISR"
## [73] "ITA" "JAM" "JPN" "JOR" "KAZ" "KEN" "KIR" "KWT" "KGZ" "LAO" "LVA" "LBN"
## [85] "LSO" "LBR" "LTU" "LUX" "MDG" "MWI" "MYS" "MDV" "MLI" "MLT" "MHL" "MRT"
## [97] "MUS" "MEX" "MDA" "MNG" "MNE" "MAR" "MOZ" "MMR" "NAM" "NRU" "NPL" "NLD"
## [109] "NZL" "NER" "NGA" "PRK" "NOR" "OMN" "PAK" "PLW" "PAN" "PNG" "PRY" "PER"
## [121] "PHL" "POL" "PRT" "QAT" "ROU" "RUS" "RWA" "WSM" "STP" "SAU" "SEN" "SRB"
## [133] "SYC" "SLE" "SGP" "SVK" "SVN" "SLB" "ZAF" "KOR" "ESP" "LKA" "SWE" "CHE"
## [145] "TZA" "THA" "TGO" "TON" "TUN" "TUR" "TKM" "TUV" "UGA" "UKR" "GBR" "USA"
## [157] "URY" "UZB" "VUT" "VNM" "YEM" "ZMB" "ZWE"
# Identify codes in countries_data but not in world
missing_in_world <- setdiff(countries_data$Code, world$iso_a3)
print(missing_in_world)
## [1] "FRA" "NOR"
# Identify codes in world but not in countries_data
missing_in_data <- setdiff(world$iso_a3, countries_data$Code)
print(missing_in_data)
## [1] "VEN" "VAT" "FSM" "MNP" "VIR" "GUM" "ASM" "PRI" "SGS" "IOT" "SHN" "PCN"
## [13] "AIA" "FLK" "CYM" "BMU" "VGB" "TCA" "MSR" "JEY" "GGY" "IMN" "ARE" "TTO"
## [25] "TJK" "TWN" "SYR" "SUR" "SSD" "SDN" "SOM" "-99" "SMR" "VCT" "LCA" "KNA"
## [37] "NIC" "NIU" "COK" "ABW" "CUW" "ESH" "MCO" "MKD" "LIE" "LBY" "PSE" "HND"
## [49] "GIN" "GRD" "GAB" "SPM" "WLF" "MAF" "BLM" "PYF" "NCL" "ATF" "ALA" "GNQ"
## [61] "DMA" "DJI" "GRL" "FRO" "MAC" "HKG" "CAF" "BTN" "HMD" "NFK" "ATG" "AGO"
## [73] "ATA" "SXM"
# Fix missing ISO codes
fix_iso_codes <- function(world_data) {
world_data %>%
mutate(iso_a3 = ifelse(name == "France", "FRA", iso_a3)) %>%
mutate(iso_a3 = ifelse(name == "Norway", "NOR", iso_a3))
}
world <- fix_iso_codes(world)
# Merge world map data with smoking data
map_data <- world %>%
left_join(countries_data, by = c("iso_a3" = "Code"))
# Replace NA prevalence values with 0
map_data <- map_data %>%
mutate(Prevalence = ifelse(is.na(Prevalence), 0, Prevalence))
# Inspect the merged data
str(map_data)
## Classes 'sf' and 'data.frame': 894 obs. of 172 variables:
## $ featurecla: chr "Admin-0 country" "Admin-0 country" "Admin-0 country" "Admin-0 country" ...
## $ scalerank : int 1 1 1 1 1 1 1 1 1 1 ...
## $ labelrank : int 3 3 3 3 3 3 3 3 3 3 ...
## $ sovereignt: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ sov_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ level : int 2 2 2 2 2 2 2 2 2 2 ...
## $ type : chr "Sovereign country" "Sovereign country" "Sovereign country" "Sovereign country" ...
## $ tlc : chr "1" "1" "1" "1" ...
## $ admin : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ adm0_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ geou_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ geounit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ gu_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ su_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ subunit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ su_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_diff : int 0 0 0 0 0 0 0 0 0 0 ...
## $ name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_long : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_group : chr NA NA NA NA ...
## $ abbrev : chr "Zimb." "Zimb." "Zimb." "Zimb." ...
## $ postal : chr "ZW" "ZW" "ZW" "ZW" ...
## $ formal_en : chr "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" ...
## $ formal_fr : chr NA NA NA NA ...
## $ name_ciawf: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ note_adm0 : chr NA NA NA NA ...
## $ note_brk : chr NA NA NA NA ...
## $ name_sort : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_alt : chr NA NA NA NA ...
## $ mapcolor7 : int 1 1 1 1 1 5 5 5 5 5 ...
## $ mapcolor8 : int 5 5 5 5 5 8 8 8 8 8 ...
## $ mapcolor9 : int 3 3 3 3 3 5 5 5 5 5 ...
## $ mapcolor13: int 9 9 9 9 9 13 13 13 13 13 ...
## $ pop_est : num 14645468 14645468 14645468 14645468 14645468 ...
## $ pop_rank : int 14 14 14 14 14 14 14 14 14 14 ...
## $ pop_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ gdp_md : int 21440 21440 21440 21440 21440 23309 23309 23309 23309 23309 ...
## $ gdp_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ economy : chr "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" ...
## $ income_grp: chr "5. Low income" "5. Low income" "5. Low income" "5. Low income" ...
## $ fips_10 : chr "ZI" "ZI" "ZI" "ZI" ...
## $ iso_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a2_eh : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_a3_eh : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_n3 : chr "716" "716" "716" "716" ...
## $ iso_n3_eh : chr "716" "716" "716" "716" ...
## $ un_a3 : chr "716" "716" "716" "716" ...
## $ wb_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ wb_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ woe_id : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_id_eh : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_note : chr "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" ...
## $ adm0_iso : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_diff : chr NA NA NA NA ...
## $ adm0_tlc : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_us: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_fr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ru: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_es: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_cn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tw: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_in: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_np: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pk: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_de: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gb: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_br: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_il: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ps: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_sa: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_eg: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ma: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pt: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ar: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_jp: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ko: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_vn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_id: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_it: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_nl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_se: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_bd: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ua: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_un: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ adm0_a3_wb: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ continent : chr "Africa" "Africa" "Africa" "Africa" ...
## $ region_un : chr "Africa" "Africa" "Africa" "Africa" ...
## $ subregion : chr "Eastern Africa" "Eastern Africa" "Eastern Africa" "Eastern Africa" ...
## $ region_wb : chr "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" ...
## $ name_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ long_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ abbrev_len: int 5 5 5 5 5 6 6 6 6 6 ...
## [list output truncated]
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:171] "featurecla" "scalerank" "labelrank" "sovereignt" ...
summary(map_data)
## featurecla scalerank labelrank sovereignt
## Length:894 Min. :1.00 Min. :2.000 Length:894
## Class :character 1st Qu.:1.00 1st Qu.:3.000 Class :character
## Mode :character Median :1.00 Median :4.000 Mode :character
## Mean :1.73 Mean :3.868
## 3rd Qu.:3.00 3rd Qu.:5.000
## Max. :6.00 Max. :7.000
##
## sov_a3 adm0_dif level type
## Length:894 Min. :0.000 Min. :1.000 Length:894
## Class :character 1st Qu.:0.000 1st Qu.:2.000 Class :character
## Mode :character Median :0.000 Median :2.000 Mode :character
## Mean :0.113 Mean :1.989
## 3rd Qu.:0.000 3rd Qu.:2.000
## Max. :1.000 Max. :2.000
##
## tlc admin adm0_a3 geou_dif
## Length:894 Length:894 Length:894 Min. :0
## Class :character Class :character Class :character 1st Qu.:0
## Mode :character Mode :character Mode :character Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## geounit gu_a3 su_dif subunit
## Length:894 Length:894 Min. :0.00000 Length:894
## Class :character Class :character 1st Qu.:0.00000 Class :character
## Mode :character Mode :character Median :0.00000 Mode :character
## Mean :0.01119
## 3rd Qu.:0.00000
## Max. :1.00000
##
## su_a3 brk_diff name name_long
## Length:894 Min. :0.00000 Length:894 Length:894
## Class :character 1st Qu.:0.00000 Class :character Class :character
## Mode :character Median :0.00000 Mode :character Mode :character
## Mean :0.01007
## 3rd Qu.:0.00000
## Max. :1.00000
##
## brk_a3 brk_name brk_group abbrev
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## postal formal_en formal_fr name_ciawf
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## note_adm0 note_brk name_sort name_alt
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## mapcolor7 mapcolor8 mapcolor9 mapcolor13
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :-99.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
## Median :3.000 Median :3.000 Median :3.000 Median : 6.000
## Mean :3.177 Mean :3.459 Mean :3.752 Mean : 6.208
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.: 9.000
## Max. :7.000 Max. :8.000 Max. :9.000 Max. : 13.000
##
## pop_est pop_rank pop_year gdp_md
## Min. :0.000e+00 Min. : 1.00 Min. :2011 Min. : -99
## 1st Qu.:1.962e+06 1st Qu.:12.00 1st Qu.:2019 1st Qu.: 10354
## Median :8.776e+06 Median :13.00 Median :2019 Median : 40000
## Mean :4.177e+07 Mean :12.98 Mean :2019 Mean : 479934
## 3rd Qu.:3.037e+07 3rd Qu.:15.00 3rd Qu.:2019 3rd Qu.: 250529
## Max. :1.398e+09 Max. :18.00 Max. :2020 Max. :21433226
##
## gdp_year economy income_grp fips_10
## Min. :2003 Length:894 Length:894 Length:894
## 1st Qu.:2019 Class :character Class :character Class :character
## Median :2019 Mode :character Mode :character Mode :character
## Mean :2019
## 3rd Qu.:2019
## Max. :2019
##
## iso_a2 iso_a2_eh iso_a3 iso_a3_eh
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## iso_n3 iso_n3_eh un_a3 wb_a2
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## wb_a3 woe_id woe_id_eh woe_note
## Length:894 Min. : -99 Min. : -99 Length:894
## Class :character 1st Qu.:23424781 1st Qu.:23424794 Class :character
## Mode :character Median :23424863 Median :23424871 Mode :character
## Mean :22202644 Mean :23423325
## 3rd Qu.:23424929 3rd Qu.:23424933
## Max. :56042305 Max. :56042305
##
## adm0_iso adm0_diff adm0_tlc adm0_a3_us
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_fr adm0_a3_ru adm0_a3_es adm0_a3_cn
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_de adm0_a3_gb adm0_a3_br adm0_a3_il
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_pt adm0_a3_ar adm0_a3_jp adm0_a3_ko
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_bd adm0_a3_ua adm0_a3_un adm0_a3_wb
## Length:894 Length:894 Min. :-99 Min. :-99
## Class :character Class :character 1st Qu.:-99 1st Qu.:-99
## Mode :character Mode :character Median :-99 Median :-99
## Mean :-99 Mean :-99
## 3rd Qu.:-99 3rd Qu.:-99
## Max. :-99 Max. :-99
##
## continent region_un subregion region_wb
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_len long_len abbrev_len tiny
## Min. : 4.000 Min. : 4.000 Min. : 3.000 Min. :-99.0
## 1st Qu.: 6.000 1st Qu.: 6.000 1st Qu.: 4.000 1st Qu.:-99.0
## Median : 7.000 Median : 7.000 Median : 4.000 Median :-99.0
## Mean : 8.102 Mean : 8.935 Mean : 4.736 Mean :-82.1
## 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.: 5.000 3rd Qu.:-99.0
## Max. :25.000 Max. :35.000 Max. :13.000 Max. : 6.0
##
## homepart min_zoom min_label max_label
## Min. :-99.000 Min. :0.00000 Min. :1.700 Min. : 5.200
## 1st Qu.: 1.000 1st Qu.:0.00000 1st Qu.:2.700 1st Qu.: 7.000
## Median : 1.000 Median :0.00000 Median :3.000 Median : 8.000
## Mean : -3.586 Mean :0.02315 Mean :3.392 Mean : 8.268
## 3rd Qu.: 1.000 3rd Qu.:0.00000 3rd Qu.:4.000 3rd Qu.: 9.000
## Max. : 1.000 Max. :7.00000 Max. :6.500 Max. :11.000
##
## label_x label_y ne_id wikidataid
## Min. :-178.137 Min. :-79.84 Min. :1.159e+09 Length:894
## 1st Qu.: -7.187 1st Qu.: 1.48 1st Qu.:1.159e+09 Class :character
## Median : 21.726 Median : 18.69 Median :1.159e+09 Mode :character
## Mean : 21.259 Mean : 18.70 Mean :1.159e+09
## 3rd Qu.: 51.144 3rd Qu.: 40.40 3rd Qu.:1.159e+09
## Max. : 179.210 Max. : 74.32 Max. :1.159e+09
##
## name_ar name_bn name_de name_en
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_es name_fa name_fr name_el
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_he name_hi name_hu name_id
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_it name_ja name_ko name_nl
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_pl name_pt name_ru name_sv
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_tr name_uk name_ur name_vi
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_zh name_zht fclass_iso tlc_diff
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_tlc fclass_us fclass_fr fclass_ru
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_es fclass_cn fclass_tw fclass_in
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_np fclass_pk fclass_de fclass_gb
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_br fclass_il fclass_ps fclass_sa
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_eg fclass_ma fclass_pt fclass_ar
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_jp fclass_ko fclass_vn fclass_tr
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_id fclass_pl fclass_gr fclass_it
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_nl fclass_se fclass_bd fclass_ua
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Entity Year Prevalence geometry
## Length:894 Min. :2000 Min. : 0.00 MULTIPOLYGON :894
## Class :character 1st Qu.:2005 1st Qu.:13.00 epsg:4326 : 0
## Mode :character Median :2010 Median :23.30 +proj=long...: 0
## Mean :2010 Mean :22.59
## 3rd Qu.:2015 3rd Qu.:31.50
## Max. :2020 Max. :68.50
## NA's :79
#get rid of geom
plot_data <- map_data %>%
st_set_geometry(NULL) # Drop geometry for Plotly compatibility
#interactive map
# Determine min and max prevalence values
min_prevalence <- min(plot_data$Prevalence, na.rm = TRUE)
max_prevalence <- max(plot_data$Prevalence, na.rm = TRUE)
#removing duplicate
# Check structure of the dataset
str(plot_data)
## 'data.frame': 894 obs. of 171 variables:
## $ featurecla: chr "Admin-0 country" "Admin-0 country" "Admin-0 country" "Admin-0 country" ...
## $ scalerank : int 1 1 1 1 1 1 1 1 1 1 ...
## $ labelrank : int 3 3 3 3 3 3 3 3 3 3 ...
## $ sovereignt: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ sov_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ level : int 2 2 2 2 2 2 2 2 2 2 ...
## $ type : chr "Sovereign country" "Sovereign country" "Sovereign country" "Sovereign country" ...
## $ tlc : chr "1" "1" "1" "1" ...
## $ admin : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ adm0_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ geou_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ geounit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ gu_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ su_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ subunit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ su_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_diff : int 0 0 0 0 0 0 0 0 0 0 ...
## $ name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_long : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_group : chr NA NA NA NA ...
## $ abbrev : chr "Zimb." "Zimb." "Zimb." "Zimb." ...
## $ postal : chr "ZW" "ZW" "ZW" "ZW" ...
## $ formal_en : chr "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" ...
## $ formal_fr : chr NA NA NA NA ...
## $ name_ciawf: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ note_adm0 : chr NA NA NA NA ...
## $ note_brk : chr NA NA NA NA ...
## $ name_sort : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_alt : chr NA NA NA NA ...
## $ mapcolor7 : int 1 1 1 1 1 5 5 5 5 5 ...
## $ mapcolor8 : int 5 5 5 5 5 8 8 8 8 8 ...
## $ mapcolor9 : int 3 3 3 3 3 5 5 5 5 5 ...
## $ mapcolor13: int 9 9 9 9 9 13 13 13 13 13 ...
## $ pop_est : num 14645468 14645468 14645468 14645468 14645468 ...
## $ pop_rank : int 14 14 14 14 14 14 14 14 14 14 ...
## $ pop_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ gdp_md : int 21440 21440 21440 21440 21440 23309 23309 23309 23309 23309 ...
## $ gdp_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ economy : chr "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" ...
## $ income_grp: chr "5. Low income" "5. Low income" "5. Low income" "5. Low income" ...
## $ fips_10 : chr "ZI" "ZI" "ZI" "ZI" ...
## $ iso_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a2_eh : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_a3_eh : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_n3 : chr "716" "716" "716" "716" ...
## $ iso_n3_eh : chr "716" "716" "716" "716" ...
## $ un_a3 : chr "716" "716" "716" "716" ...
## $ wb_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ wb_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ woe_id : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_id_eh : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_note : chr "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" ...
## $ adm0_iso : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_diff : chr NA NA NA NA ...
## $ adm0_tlc : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_us: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_fr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ru: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_es: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_cn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tw: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_in: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_np: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pk: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_de: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gb: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_br: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_il: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ps: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_sa: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_eg: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ma: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pt: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ar: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_jp: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ko: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_vn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_id: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_it: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_nl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_se: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_bd: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ua: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_un: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ adm0_a3_wb: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ continent : chr "Africa" "Africa" "Africa" "Africa" ...
## $ region_un : chr "Africa" "Africa" "Africa" "Africa" ...
## $ subregion : chr "Eastern Africa" "Eastern Africa" "Eastern Africa" "Eastern Africa" ...
## $ region_wb : chr "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" ...
## $ name_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ long_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ abbrev_len: int 5 5 5 5 5 6 6 6 6 6 ...
## [list output truncated]
# Ensure there are no duplicates
duplicates <- plot_data %>%
group_by(iso_a3, Year) %>%
filter(n() > 1)
print(duplicates)
## # A tibble: 6 × 171
## # Groups: iso_a3, Year [1]
## featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level type tlc
## <chr> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 Admin-0 coun… 1 5 Somaliland SOL 0 2 Sove… 1
## 2 Admin-0 coun… 1 6 Kosovo KOS 0 2 Disp… 1
## 3 Admin-0 coun… 1 6 Northern … CYN 0 2 Sove… 1
## 4 Admin-0 coun… 5 5 Australia AU1 1 2 Depe… 1
## 5 Admin-0 coun… 5 5 Australia AU1 1 2 Depe… 1
## 6 Admin-0 coun… 1 5 Kashmir KAS 0 2 Inde… <NA>
## # ℹ 162 more variables: admin <chr>, adm0_a3 <chr>, geou_dif <int>,
## # geounit <chr>, gu_a3 <chr>, su_dif <int>, subunit <chr>, su_a3 <chr>,
## # brk_diff <int>, name <chr>, name_long <chr>, brk_a3 <chr>, brk_name <chr>,
## # brk_group <chr>, abbrev <chr>, postal <chr>, formal_en <chr>,
## # formal_fr <chr>, name_ciawf <chr>, note_adm0 <chr>, note_brk <chr>,
## # name_sort <chr>, name_alt <chr>, mapcolor7 <int>, mapcolor8 <int>,
## # mapcolor9 <int>, mapcolor13 <int>, pop_est <dbl>, pop_rank <int>, …
# Check for missing or incorrect values
summary(plot_data)
## featurecla scalerank labelrank sovereignt
## Length:894 Min. :1.00 Min. :2.000 Length:894
## Class :character 1st Qu.:1.00 1st Qu.:3.000 Class :character
## Mode :character Median :1.00 Median :4.000 Mode :character
## Mean :1.73 Mean :3.868
## 3rd Qu.:3.00 3rd Qu.:5.000
## Max. :6.00 Max. :7.000
##
## sov_a3 adm0_dif level type
## Length:894 Min. :0.000 Min. :1.000 Length:894
## Class :character 1st Qu.:0.000 1st Qu.:2.000 Class :character
## Mode :character Median :0.000 Median :2.000 Mode :character
## Mean :0.113 Mean :1.989
## 3rd Qu.:0.000 3rd Qu.:2.000
## Max. :1.000 Max. :2.000
##
## tlc admin adm0_a3 geou_dif
## Length:894 Length:894 Length:894 Min. :0
## Class :character Class :character Class :character 1st Qu.:0
## Mode :character Mode :character Mode :character Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## geounit gu_a3 su_dif subunit
## Length:894 Length:894 Min. :0.00000 Length:894
## Class :character Class :character 1st Qu.:0.00000 Class :character
## Mode :character Mode :character Median :0.00000 Mode :character
## Mean :0.01119
## 3rd Qu.:0.00000
## Max. :1.00000
##
## su_a3 brk_diff name name_long
## Length:894 Min. :0.00000 Length:894 Length:894
## Class :character 1st Qu.:0.00000 Class :character Class :character
## Mode :character Median :0.00000 Mode :character Mode :character
## Mean :0.01007
## 3rd Qu.:0.00000
## Max. :1.00000
##
## brk_a3 brk_name brk_group abbrev
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## postal formal_en formal_fr name_ciawf
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## note_adm0 note_brk name_sort name_alt
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## mapcolor7 mapcolor8 mapcolor9 mapcolor13
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :-99.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
## Median :3.000 Median :3.000 Median :3.000 Median : 6.000
## Mean :3.177 Mean :3.459 Mean :3.752 Mean : 6.208
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.: 9.000
## Max. :7.000 Max. :8.000 Max. :9.000 Max. : 13.000
##
## pop_est pop_rank pop_year gdp_md
## Min. :0.000e+00 Min. : 1.00 Min. :2011 Min. : -99
## 1st Qu.:1.962e+06 1st Qu.:12.00 1st Qu.:2019 1st Qu.: 10354
## Median :8.776e+06 Median :13.00 Median :2019 Median : 40000
## Mean :4.177e+07 Mean :12.98 Mean :2019 Mean : 479934
## 3rd Qu.:3.037e+07 3rd Qu.:15.00 3rd Qu.:2019 3rd Qu.: 250529
## Max. :1.398e+09 Max. :18.00 Max. :2020 Max. :21433226
##
## gdp_year economy income_grp fips_10
## Min. :2003 Length:894 Length:894 Length:894
## 1st Qu.:2019 Class :character Class :character Class :character
## Median :2019 Mode :character Mode :character Mode :character
## Mean :2019
## 3rd Qu.:2019
## Max. :2019
##
## iso_a2 iso_a2_eh iso_a3 iso_a3_eh
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## iso_n3 iso_n3_eh un_a3 wb_a2
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## wb_a3 woe_id woe_id_eh woe_note
## Length:894 Min. : -99 Min. : -99 Length:894
## Class :character 1st Qu.:23424781 1st Qu.:23424794 Class :character
## Mode :character Median :23424863 Median :23424871 Mode :character
## Mean :22202644 Mean :23423325
## 3rd Qu.:23424929 3rd Qu.:23424933
## Max. :56042305 Max. :56042305
##
## adm0_iso adm0_diff adm0_tlc adm0_a3_us
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_fr adm0_a3_ru adm0_a3_es adm0_a3_cn
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_de adm0_a3_gb adm0_a3_br adm0_a3_il
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_pt adm0_a3_ar adm0_a3_jp adm0_a3_ko
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## adm0_a3_bd adm0_a3_ua adm0_a3_un adm0_a3_wb
## Length:894 Length:894 Min. :-99 Min. :-99
## Class :character Class :character 1st Qu.:-99 1st Qu.:-99
## Mode :character Mode :character Median :-99 Median :-99
## Mean :-99 Mean :-99
## 3rd Qu.:-99 3rd Qu.:-99
## Max. :-99 Max. :-99
##
## continent region_un subregion region_wb
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_len long_len abbrev_len tiny
## Min. : 4.000 Min. : 4.000 Min. : 3.000 Min. :-99.0
## 1st Qu.: 6.000 1st Qu.: 6.000 1st Qu.: 4.000 1st Qu.:-99.0
## Median : 7.000 Median : 7.000 Median : 4.000 Median :-99.0
## Mean : 8.102 Mean : 8.935 Mean : 4.736 Mean :-82.1
## 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.: 5.000 3rd Qu.:-99.0
## Max. :25.000 Max. :35.000 Max. :13.000 Max. : 6.0
##
## homepart min_zoom min_label max_label
## Min. :-99.000 Min. :0.00000 Min. :1.700 Min. : 5.200
## 1st Qu.: 1.000 1st Qu.:0.00000 1st Qu.:2.700 1st Qu.: 7.000
## Median : 1.000 Median :0.00000 Median :3.000 Median : 8.000
## Mean : -3.586 Mean :0.02315 Mean :3.392 Mean : 8.268
## 3rd Qu.: 1.000 3rd Qu.:0.00000 3rd Qu.:4.000 3rd Qu.: 9.000
## Max. : 1.000 Max. :7.00000 Max. :6.500 Max. :11.000
##
## label_x label_y ne_id wikidataid
## Min. :-178.137 Min. :-79.84 Min. :1.159e+09 Length:894
## 1st Qu.: -7.187 1st Qu.: 1.48 1st Qu.:1.159e+09 Class :character
## Median : 21.726 Median : 18.69 Median :1.159e+09 Mode :character
## Mean : 21.259 Mean : 18.70 Mean :1.159e+09
## 3rd Qu.: 51.144 3rd Qu.: 40.40 3rd Qu.:1.159e+09
## Max. : 179.210 Max. : 74.32 Max. :1.159e+09
##
## name_ar name_bn name_de name_en
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_es name_fa name_fr name_el
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_he name_hi name_hu name_id
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_it name_ja name_ko name_nl
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_pl name_pt name_ru name_sv
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_tr name_uk name_ur name_vi
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## name_zh name_zht fclass_iso tlc_diff
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_tlc fclass_us fclass_fr fclass_ru
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_es fclass_cn fclass_tw fclass_in
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_np fclass_pk fclass_de fclass_gb
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_br fclass_il fclass_ps fclass_sa
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_eg fclass_ma fclass_pt fclass_ar
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_jp fclass_ko fclass_vn fclass_tr
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_id fclass_pl fclass_gr fclass_it
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## fclass_nl fclass_se fclass_bd fclass_ua
## Length:894 Length:894 Length:894 Length:894
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Entity Year Prevalence
## Length:894 Min. :2000 Min. : 0.00
## Class :character 1st Qu.:2005 1st Qu.:13.00
## Mode :character Median :2010 Median :23.30
## Mean :2010 Mean :22.59
## 3rd Qu.:2015 3rd Qu.:31.50
## Max. :2020 Max. :68.50
## NA's :79
# Find rows in the merged map_data where Prevalence is NA
unmatched <- map_data %>%
filter(is.na(Prevalence))
head(unmatched) # Check if these rows correspond to Somaliland, Kosovo, etc.
## Simple feature collection with 0 features and 171 fields
## Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
## Geodetic CRS: WGS 84
## [1] featurecla scalerank labelrank sovereignt sov_a3 adm0_dif
## [7] level type tlc admin adm0_a3 geou_dif
## [13] geounit gu_a3 su_dif subunit su_a3 brk_diff
## [19] name name_long brk_a3 brk_name brk_group abbrev
## [25] postal formal_en formal_fr name_ciawf note_adm0 note_brk
## [31] name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
## [37] pop_est pop_rank pop_year gdp_md gdp_year economy
## [43] income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh
## [49] iso_n3 iso_n3_eh un_a3 wb_a2 wb_a3 woe_id
## [55] woe_id_eh woe_note adm0_iso adm0_diff adm0_tlc adm0_a3_us
## [61] adm0_a3_fr adm0_a3_ru adm0_a3_es adm0_a3_cn adm0_a3_tw adm0_a3_in
## [67] adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb adm0_a3_br adm0_a3_il
## [73] adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt adm0_a3_ar
## [79] adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
## [85] adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua
## [91] adm0_a3_un adm0_a3_wb continent region_un subregion region_wb
## [97] name_len long_len abbrev_len tiny homepart min_zoom
## [103] min_label max_label label_x label_y ne_id wikidataid
## [109] name_ar name_bn name_de name_en name_es name_fa
## [115] name_fr name_el name_he name_hi name_hu name_id
## [121] name_it name_ja name_ko name_nl name_pl name_pt
## [127] name_ru name_sv name_tr name_uk name_ur name_vi
## [133] name_zh name_zht fclass_iso tlc_diff fclass_tlc fclass_us
## [139] fclass_fr fclass_ru fclass_es fclass_cn fclass_tw fclass_in
## [145] fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il
## [151] fclass_ps fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar
## [157] fclass_jp fclass_ko fclass_vn fclass_tr fclass_id fclass_pl
## [163] fclass_gr fclass_it fclass_nl fclass_se fclass_bd fclass_ua
## [169] Entity Year Prevalence geometry
## <0 rows> (or 0-length row.names)
map_data <- map_data %>%
filter(!is.na(Prevalence))
# Find codes in world that do not match countries_data
missing_in_countries_data <- setdiff(world$iso_a3, countries_data$Code)
print(missing_in_countries_data)
## [1] "VEN" "VAT" "FSM" "MNP" "VIR" "GUM" "ASM" "PRI" "SGS" "IOT" "SHN" "PCN"
## [13] "AIA" "FLK" "CYM" "BMU" "VGB" "TCA" "MSR" "JEY" "GGY" "IMN" "ARE" "TTO"
## [25] "TJK" "TWN" "SYR" "SUR" "SSD" "SDN" "SOM" "-99" "SMR" "VCT" "LCA" "KNA"
## [37] "NIC" "NIU" "COK" "ABW" "CUW" "ESH" "MCO" "MKD" "LIE" "LBY" "PSE" "HND"
## [49] "GIN" "GRD" "GAB" "SPM" "WLF" "MAF" "BLM" "PYF" "NCL" "ATF" "ALA" "GNQ"
## [61] "DMA" "DJI" "GRL" "FRO" "MAC" "HKG" "CAF" "BTN" "HMD" "NFK" "ATG" "AGO"
## [73] "ATA" "SXM"
map_data <- world %>%
inner_join(countries_data, by = c("iso_a3" = "Code"))
# Check structure of the dataset
str(plot_data)
## 'data.frame': 894 obs. of 171 variables:
## $ featurecla: chr "Admin-0 country" "Admin-0 country" "Admin-0 country" "Admin-0 country" ...
## $ scalerank : int 1 1 1 1 1 1 1 1 1 1 ...
## $ labelrank : int 3 3 3 3 3 3 3 3 3 3 ...
## $ sovereignt: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ sov_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ level : int 2 2 2 2 2 2 2 2 2 2 ...
## $ type : chr "Sovereign country" "Sovereign country" "Sovereign country" "Sovereign country" ...
## $ tlc : chr "1" "1" "1" "1" ...
## $ admin : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ adm0_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ geou_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ geounit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ gu_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ su_dif : int 0 0 0 0 0 0 0 0 0 0 ...
## $ subunit : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ su_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_diff : int 0 0 0 0 0 0 0 0 0 0 ...
## $ name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_long : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ brk_name : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ brk_group : chr NA NA NA NA ...
## $ abbrev : chr "Zimb." "Zimb." "Zimb." "Zimb." ...
## $ postal : chr "ZW" "ZW" "ZW" "ZW" ...
## $ formal_en : chr "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" "Republic of Zimbabwe" ...
## $ formal_fr : chr NA NA NA NA ...
## $ name_ciawf: chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ note_adm0 : chr NA NA NA NA ...
## $ note_brk : chr NA NA NA NA ...
## $ name_sort : chr "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ...
## $ name_alt : chr NA NA NA NA ...
## $ mapcolor7 : int 1 1 1 1 1 5 5 5 5 5 ...
## $ mapcolor8 : int 5 5 5 5 5 8 8 8 8 8 ...
## $ mapcolor9 : int 3 3 3 3 3 5 5 5 5 5 ...
## $ mapcolor13: int 9 9 9 9 9 13 13 13 13 13 ...
## $ pop_est : num 14645468 14645468 14645468 14645468 14645468 ...
## $ pop_rank : int 14 14 14 14 14 14 14 14 14 14 ...
## $ pop_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ gdp_md : int 21440 21440 21440 21440 21440 23309 23309 23309 23309 23309 ...
## $ gdp_year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ economy : chr "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" "5. Emerging region: G20" ...
## $ income_grp: chr "5. Low income" "5. Low income" "5. Low income" "5. Low income" ...
## $ fips_10 : chr "ZI" "ZI" "ZI" "ZI" ...
## $ iso_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a2_eh : chr "ZW" "ZW" "ZW" "ZW" ...
## $ iso_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_a3_eh : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ iso_n3 : chr "716" "716" "716" "716" ...
## $ iso_n3_eh : chr "716" "716" "716" "716" ...
## $ un_a3 : chr "716" "716" "716" "716" ...
## $ wb_a2 : chr "ZW" "ZW" "ZW" "ZW" ...
## $ wb_a3 : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ woe_id : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_id_eh : int 23425004 23425004 23425004 23425004 23425004 23425003 23425003 23425003 23425003 23425003 ...
## $ woe_note : chr "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" ...
## $ adm0_iso : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_diff : chr NA NA NA NA ...
## $ adm0_tlc : chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_us: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_fr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ru: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_es: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_cn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tw: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_in: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_np: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pk: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_de: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gb: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_br: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_il: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ps: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_sa: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_eg: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ma: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pt: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ar: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_jp: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ko: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_vn: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_tr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_id: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_pl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_gr: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_it: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_nl: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_se: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_bd: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_ua: chr "ZWE" "ZWE" "ZWE" "ZWE" ...
## $ adm0_a3_un: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ adm0_a3_wb: int -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ...
## $ continent : chr "Africa" "Africa" "Africa" "Africa" ...
## $ region_un : chr "Africa" "Africa" "Africa" "Africa" ...
## $ subregion : chr "Eastern Africa" "Eastern Africa" "Eastern Africa" "Eastern Africa" ...
## $ region_wb : chr "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" ...
## $ name_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ long_len : int 8 8 8 8 8 6 6 6 6 6 ...
## $ abbrev_len: int 5 5 5 5 5 6 6 6 6 6 ...
## [list output truncated]
# Ensure no duplicate country-year pairs
duplicates <- plot_data %>%
group_by(iso_a3, Year) %>%
filter(n() > 1)
print(duplicates) # This should be empty
## # A tibble: 6 × 171
## # Groups: iso_a3, Year [1]
## featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level type tlc
## <chr> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 Admin-0 coun… 1 5 Somaliland SOL 0 2 Sove… 1
## 2 Admin-0 coun… 1 6 Kosovo KOS 0 2 Disp… 1
## 3 Admin-0 coun… 1 6 Northern … CYN 0 2 Sove… 1
## 4 Admin-0 coun… 5 5 Australia AU1 1 2 Depe… 1
## 5 Admin-0 coun… 5 5 Australia AU1 1 2 Depe… 1
## 6 Admin-0 coun… 1 5 Kashmir KAS 0 2 Inde… <NA>
## # ℹ 162 more variables: admin <chr>, adm0_a3 <chr>, geou_dif <int>,
## # geounit <chr>, gu_a3 <chr>, su_dif <int>, subunit <chr>, su_a3 <chr>,
## # brk_diff <int>, name <chr>, name_long <chr>, brk_a3 <chr>, brk_name <chr>,
## # brk_group <chr>, abbrev <chr>, postal <chr>, formal_en <chr>,
## # formal_fr <chr>, name_ciawf <chr>, note_adm0 <chr>, note_brk <chr>,
## # name_sort <chr>, name_alt <chr>, mapcolor7 <int>, mapcolor8 <int>,
## # mapcolor9 <int>, mapcolor13 <int>, pop_est <dbl>, pop_rank <int>, …
# Check for missing prevalence values
missing_prevalence <- plot_data %>%
filter(is.na(Prevalence))
print(missing_prevalence) # Check if some years have missing data
## [1] featurecla scalerank labelrank sovereignt sov_a3 adm0_dif
## [7] level type tlc admin adm0_a3 geou_dif
## [13] geounit gu_a3 su_dif subunit su_a3 brk_diff
## [19] name name_long brk_a3 brk_name brk_group abbrev
## [25] postal formal_en formal_fr name_ciawf note_adm0 note_brk
## [31] name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
## [37] pop_est pop_rank pop_year gdp_md gdp_year economy
## [43] income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh
## [49] iso_n3 iso_n3_eh un_a3 wb_a2 wb_a3 woe_id
## [55] woe_id_eh woe_note adm0_iso adm0_diff adm0_tlc adm0_a3_us
## [61] adm0_a3_fr adm0_a3_ru adm0_a3_es adm0_a3_cn adm0_a3_tw adm0_a3_in
## [67] adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb adm0_a3_br adm0_a3_il
## [73] adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt adm0_a3_ar
## [79] adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl
## [85] adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua
## [91] adm0_a3_un adm0_a3_wb continent region_un subregion region_wb
## [97] name_len long_len abbrev_len tiny homepart min_zoom
## [103] min_label max_label label_x label_y ne_id wikidataid
## [109] name_ar name_bn name_de name_en name_es name_fa
## [115] name_fr name_el name_he name_hi name_hu name_id
## [121] name_it name_ja name_ko name_nl name_pl name_pt
## [127] name_ru name_sv name_tr name_uk name_ur name_vi
## [133] name_zh name_zht fclass_iso tlc_diff fclass_tlc fclass_us
## [139] fclass_fr fclass_ru fclass_es fclass_cn fclass_tw fclass_in
## [145] fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il
## [151] fclass_ps fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar
## [157] fclass_jp fclass_ko fclass_vn fclass_tr fclass_id fclass_pl
## [163] fclass_gr fclass_it fclass_nl fclass_se fclass_bd fclass_ua
## [169] Entity Year Prevalence
## <0 rows> (or 0-length row.names)
# Check the distribution of years and countries
table(plot_data$Year)
##
## 2000 2005 2010 2015 2020
## 163 163 163 163 163
table(plot_data$iso_a3)
##
## -99 ABW AFG AGO AIA ALA ALB AND ARE ARG ARM ASM ATA ATF ATG AUS AUT AZE BDI BEL
## 6 1 5 1 1 1 5 5 1 5 5 1 1 1 1 5 5 5 5 5
## BEN BFA BGD BGR BHR BHS BIH BLM BLR BLZ BMU BOL BRA BRB BRN BTN BWA CAF CAN CHE
## 5 5 5 5 5 5 5 1 5 5 1 5 5 5 5 1 5 1 5 5
## CHL CHN CIV CMR COD COG COK COL COM CPV CRI CUB CUW CYM CYP CZE DEU DJI DMA DNK
## 5 5 5 5 5 5 1 5 5 5 5 5 1 1 5 5 5 1 1 5
## DOM DZA ECU EGY ERI ESH ESP EST ETH FIN FJI FLK FRA FRO FSM GAB GBR GEO GGY GHA
## 5 5 5 5 5 1 5 5 5 5 5 1 5 1 1 1 5 5 1 5
## GIN GMB GNB GNQ GRC GRD GRL GTM GUM GUY HKG HMD HND HRV HTI HUN IDN IMN IND IOT
## 1 5 5 1 5 1 1 5 1 5 1 1 1 5 5 5 5 1 5 1
## IRL IRN IRQ ISL ISR ITA JAM JEY JOR JPN KAZ KEN KGZ KHM KIR KNA KOR KWT LAO LBN
## 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5
## LBR LBY LCA LIE LKA LSO LTU LUX LVA MAC MAF MAR MCO MDA MDG MDV MEX MHL MKD MLI
## 5 1 1 1 5 5 5 5 5 1 1 5 1 5 5 5 5 5 1 5
## MLT MMR MNE MNG MNP MOZ MRT MSR MUS MWI MYS NAM NCL NER NFK NGA NIC NIU NLD NOR
## 5 5 5 5 1 5 5 1 5 5 5 5 1 5 1 5 1 1 5 5
## NPL NRU NZL OMN PAK PAN PCN PER PHL PLW PNG POL PRI PRK PRT PRY PSE PYF QAT ROU
## 5 5 5 5 5 5 1 5 5 5 5 5 1 5 5 5 1 1 5 5
## RUS RWA SAU SDN SEN SGP SGS SHN SLB SLE SLV SMR SOM SPM SRB SSD STP SUR SVK SVN
## 5 5 5 1 5 5 1 1 5 5 5 1 1 1 5 1 5 1 5 5
## SWE SWZ SXM SYC SYR TCA TCD TGO THA TJK TKM TLS TON TTO TUN TUR TUV TWN TZA UGA
## 5 5 1 5 1 1 5 5 5 1 5 5 5 1 5 5 5 1 5 5
## UKR URY USA UZB VAT VCT VEN VGB VIR VNM VUT WLF WSM YEM ZAF ZMB ZWE
## 5 5 5 5 1 1 1 1 1 5 5 1 5 5 5 5 5
Entity refers to the name of the country. Code refers to the OWID internal entity code that we use if the entity is a country or region. Year refers to the years of the prevalence. Prevalence.of.current.tobacco.use….of.adults. refers to the prevalence of current tobacco users.
subset_data <- plot_data %>% filter(Year %in% c(2000, 2005))
subset_data2 <- plot_data %>% filter(Year %in% c(2010, 2015, 2020))
combined_data <- bind_rows(subset_data, subset_data2)
# Plot using combined data
plot <- plot_ly(
data = combined_data,
type = "choropleth",
locations = ~iso_a3,
locationmode = "ISO-3",
z = ~Prevalence,
frame = ~Year,
text = ~paste("Country:", Entity, "<br>Prevalence:", Prevalence, "%"),
colorscale = "Reds",
zmin = 0,
zmax = 68.5,
showscale = TRUE
) %>%
layout(
title = "Global Smoking Prevalence (Subset Test)",
geo = list(
projection = list(type = "mercator"),
showcoastlines = TRUE,
coastlinecolor = "grey"
)
)
plot
plot <- plot_ly(
data = combined_data,
type = "choropleth",
locations = ~iso_a3,
locationmode = "ISO-3",
z = ~Prevalence,
frame = ~Year,
text = ~paste(
"Country:", Entity,
ifelse(is.na(Prevalence), "<br>No Data", paste0("<br>Prevalence: ", Prevalence, "%"))
),
colorscale = list(
c(0, "#ffeda0"), # Low prevalence: light orange
c(0.5, "#feb24c"), # Medium prevalence: orange
c(1, "#67000d") # High prevalence: dark red
),
zmin = 0,
zmax = 68.5,
showscale = TRUE,
marker = list(line = list(color = "grey", width = 0.5)) # Border for the countries
) %>%
layout(
title = "Global Smoking Prevalence (Subset Test)",
geo = list(
projection = list(type = "equirectangular"), # Rectangular projection
showcoastlines = TRUE, # Show coastlines
coastlinecolor = "grey", # Set coastline border color
showcountries = TRUE, # Ensure country borders are shown
countrycolor = "grey", # Set country border color
showland = TRUE, # Show land explicitly
landcolor = "white", # Set land colour to white
showocean = TRUE, # Enable ocean rendering
oceancolor = "lightblue", # Set ocean colour to light blue
showframe = FALSE # Optionally remove frame border
),
annotations = list(
list(
x = 0.5, # Position for the note (to the right of the map)
y = -0.1, # Vertical position (lower part of the map)
xref = "paper", # Reference the x-axis relative to the paper
yref = "paper", # Reference the y-axis relative to the paper
text = "Note: White regions indicate missing data.", # Your note
showarrow = FALSE, # Disable arrow pointing
font = list(size = 12, color = "black"), # Font size and color
align = "left"
)
)
)
# Display the plot
plot
##final final plot
plot <- plot_ly(
data = combined_data,
type = "choropleth",
locations = ~iso_a3,
locationmode = "ISO-3",
z = ~Prevalence,
frame = ~Year,
text = ~paste(
"Country:", Entity,
ifelse(is.na(Prevalence), "<br>No Data", paste0("<br>Prevalence: ", Prevalence, "%"))
),
colorscale = list(
c(0, "#ffeda0"), # Low prevalence: light orange
c(0.5, "#feb24c"), # Medium prevalence: orange
c(1, "#67000d") # High prevalence: dark red
),
zmin = 0,
zmax = 68.5,
showscale = TRUE,
marker = list(line = list(color = "grey", width = 0.5)) # Border for the countries
) %>%
layout(
title = "Global Smoking Prevalence (Subset Test)",
geo = list(
projection = list(type = "equirectangular"), # Rectangular projection
showcoastlines = TRUE, # Show coastlines
coastlinecolor = "grey", # Set coastline border color
showcountries = TRUE, # Ensure country borders are shown
countrycolor = "grey", # Set country border color
showland = TRUE, # Show land explicitly
landcolor = "white", # Set land colour to white
showocean = TRUE, # Enable ocean rendering
oceancolor = "lightblue", # Set ocean colour to light blue
showframe = FALSE # Optionally remove frame border
),
annotations = list(
list(
x = 0.5, # Position for the note (to the right of the map)
y = -0.1, # Vertical position (lower part of the map)
xref = "paper", # Reference the x-axis relative to the paper
yref = "paper", # Reference the y-axis relative to the paper
text = "Note: White regions indicate missing data.", # Your note
showarrow = FALSE, # Disable arrow pointing
font = list(size = 12, color = "black"), # Font size and color
align = "left"
)
)
)
# Display the plot
plot